Justifying and Generalizing Contrastive Divergence

نویسندگان

  • Yoshua Bengio
  • Olivier Delalleau
چکیده

We study an expansion of the log likelihood in undirected graphical models such as the restricted Boltzmann machine (RBM), where each term in the expansion is associated with a sample in a Gibbs chain alternating between two random variables (the visible vector and the hidden vector in RBMs). We are particularly interested in estimators of the gradient of the log likelihood obtained through this expansion. We show that its residual term converges to zero, justifying the use of a truncation--running only a short Gibbs chain, which is the main idea behind the contrastive divergence (CD) estimator of the log-likelihood gradient. By truncating even more, we obtain a stochastic reconstruction error, related through a mean-field approximation to the reconstruction error often used to train autoassociators and stacked autoassociators. The derivation is not specific to the particular parametric forms used in RBMs and requires only convergence of the Gibbs chain. We present theoretical and empirical evidence linking the number of Gibbs steps k and the magnitude of the RBM parameters to the bias in the CD estimator. These experiments also suggest that the sign of the CD estimator is correct most of the time, even when the bias is large, so that CD-k is a good descent direction even for small k.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Why (and When and How) Contrastive Divergence Works

Contrastive divergence (CD) is a promising method of inference in high dimensional distributions with intractable normalizing constants, however, the theoretical foundations justifying its use are somewhat weak. This document proposes a framework for understanding CD inference, including how and when it works. It provides multiple justifications for the CD moment conditions, including framing t...

متن کامل

Differential Contrastive Divergence

We formulate a differential version of contrastive divergence for continuous configuration spaces by considering a limit of MCMC processes in which the proposal distribution becomes infinitesimal. This leads to a deterministic differential contrastive divergence update — one in which no stochastic sampling is required. We prove convergence of differential contrastive divergence in general and p...

متن کامل

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training Restricted Boltzmann Machines. However, both methods use an approximate method for sampling from the model distribution. As a side effect, these approximations yield significantly different biases and variances for stochastic gradient estimates of individual data points. It is well known tha...

متن کامل

Average Contrastive Divergence for Training Restricted Boltzmann Machines

This paper studies contrastive divergence (CD) learning algorithm and proposes a new algorithm for training restricted Boltzmann machines (RBMs). We derive that CD is a biased estimator of the log-likelihood gradient method and make an analysis of the bias. Meanwhile, we propose a new learning algorithm called average contrastive divergence (ACD) for training RBMs. It is an improved CD algorith...

متن کامل

Dissimilarity Based Contrastive Divergence for Anomaly Detection

This paper describes training of a Restricted Boltzmann Machine(RBM) using dissimilarity-based contrastive divergence to obtain an anomaly detector. We go over the merits of the method over other approaches and describe the method’s usefulness to obtain a generative model.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neural computation

دوره 21 6  شماره 

صفحات  -

تاریخ انتشار 2009